308 research outputs found

    Revised Annotations, Sex-Biased Expression, and Lineage-Specific Genes in the Drosophila melanogaster group

    Full text link
    Here, we provide revised gene models for D. ananassae, D. yakuba, and D. simulans, which include UTRs and empirically verified intron-exon boundaries, as well as ortholog groups identified using a fuzzy reciprocal-best-hit blast comparison. Using these revised annotations, we perform differential expression testing using the cufflinks suite to provide a broad overview of differential expression between reproductive tissues and the carcass. We identify thousands of genes that are differentially expressed across tissues in D. yakuba and D. simulans, with roughly 60% agreement in expression patterns of orthologs in D. yakuba and D. simulans. We identify several cases of putative polycistronic transcripts, pointing to a combination of transcriptional read-through in the genome as well as putative gene fusion and fission events across taxa. We furthermore identify hundreds of lineage specific genes in each species with no blast hits among transcripts of any other Drosophila species, which are candidates for neofunctionalized proteins and a potential source of genetic novelty.Comment: Revised manuscript, also available online preprint at G3: Genes, Genomes, Genetics. Gene models, ortholog calls, and tissue specific expression results are available at http://github.com/ThorntonLab/GFF or the UCSC browser on the Thornton Lab public track hub at http://genome.ucsc.ed

    Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans

    Full text link
    We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of D. yakuba, and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting deleterious impacts are common. D. simulans shows larger numbers of whole gene duplications in comparison to larger proportions of gene fragments in D. yakuba. D. simulans displays an excess of high frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X or demographic forces driving duplicates to high frequency. We identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited non-coding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change.Comment: Revised Version- Accepted at Molecular Biology and Evolutio

    Abundance and Distribution of Transposable Elements in Two Drosophila QTL Mapping Resources

    Get PDF
    Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that “burden tests” that test for the effect of TEs as a class may be more fruitful

    Approaches to Selecting "Time Zero" in External Control Arms with Multiple Potential Entry Points: A Simulation Study of 8 Approaches

    Get PDF
    Background: When including data from an external control arm to estimate comparative effectiveness, there is a methodological choice of when to set “time zero,” the point at which a patient would be eligible/enrolled in a contemporary study. Where patients receive multiple lines of eligible therapy and thus alternative points could be selected, this issue is complex. Methods: A simulation study was conducted in which patients received multiple prior lines of therapy before entering either cohort. The results from the control and intervention data sets are compared using 8 methods for selecting time zero. The base-case comparison was set up to be biased against the intervention (which is generally received later), with methods compared in their ability to estimate the true intervention effectiveness. We further investigate the impact of key study attributes (such as sample size) and degree of overlap in time-varying covariates (such as prior lines of therapy) on study results. Results: Of the 8 methods, 5 (all lines, random line, systematically selecting groups based on mean absolute error, root mean square error, or propensity scores) showed good performance in accounting for differences between the line at which patients were included. The first eligible line can be statistically inefficient in some situations. All lines (with censoring) cannot be used for survival outcomes. The last eligible line cannot be recommended. Conclusions: Multiple methods are available for selecting the most appropriate time zero from an external control arm. Based on the simulation, we demonstrate that some methods frequently perform poorly, with several viable methods remaining. In selecting between the viable methods, analysts should consider the context of their analysis and justify the approach selected. There are multiple methods available from which an analyst may select “time zero” in an external control cohort. This simulation study demonstrates that some methods perform poorly but most are viable options, depending on context and the degree of overlap in time zero across cohorts. Careful thought and clear justification should be used when selecting the strategy for a study

    Quantification of habitat fragmentation reveals extinction risk in terrestrial mammals

    Get PDF
    Although habitat fragmentation is often assumed to be a primary driver of extinction, global patterns of fragmentation and its relationship to extinction risk have not been consistently quantified for any major animal taxon. We developed high-resolution habitat fragmentation models and used phylogenetic comparative methods to quantify the effects of habitat fragmentation on the world's terrestrial mammals, including 4,018 species across 26 taxonomic Orders. Results demonstrate that species with more fragmentation are at greater risk of extinction, even after accounting for the effects of key macroecological predictors, such as body size and geographic range size. Species with higher fragmentation had smaller ranges and a lower proportion of high-suitability habitat within their range, andmost high-suitability habitat occurred outside of protected areas, further elevating extinction risk. Our models provide a quantitative evaluation of extinction risk assessments for species, allow for identification of emerging threats in species not classified as threatened, and provide maps of global hotspots of fragmentation for the world's terrestrial mammals. Quantification of habitat fragmentation will help guide threat assessment and strategic priorities for global mammal conservation

    Folding and organization of a contiguous chromosome region according to the gene distribution pattern in primary genomic sequence

    Get PDF
    Specific mammalian genes functionally and dynamically associate together within the nucleus. Yet, how an array of many genes along the chromosome sequence can be spatially organized and folded together is unknown. We investigated the 3D structure of a well-annotated, highly conserved 4.3-Mb region on mouse chromosome 14 that contains four clusters of genes separated by gene “deserts.” In nuclei, this region forms multiple, nonrandom “higher order” structures. These structures are based on the gene distribution pattern in primary sequence and are marked by preferential associations among multiple gene clusters. Associating gene clusters represent expressed chromatin, but their aggregation is not simply dependent on ongoing transcription. In chromosomes with aggregated gene clusters, gene deserts preferentially align with the nuclear periphery, providing evidence for chromosomal region architecture by specific associations with functional nuclear domains. Together, these data suggest dynamic, probabilistic 3D folding states for a contiguous megabase-scale chromosomal region, supporting the diverse activities of multiple genes and their conserved primary sequence organization

    Validation of Rearrangement Break Points Identified by Paired-End Sequencing in Natural Populations of Drosophila melanogaster

    Get PDF
    Several recent studies have focused on the evolution of recently duplicated genes in Drosophila. Currently, however, little is known about the evolutionary forces acting upon duplications that are segregating in natural populations. We used a high-throughput, paired-end sequencing platform (Illumina) to identify structural variants in a population sample of African D. melanogaster. Polymerase chain reaction and sequencing confirmation of duplications detected by multiple, independent paired-ends showed that paired-end sequencing reliably uncovered the break points of structural rearrangements and allowed us to identify a number of tandem duplications segregating within a natural population. Our confirmation experiments show that rates of confirmation are very high, even at modest coverage. Our results also compare well with previous studies using microarrays (Emerson J, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 320:1629–1631. and Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A. 104:19920–19925.), which both gives us confidence in the results of this study as well as confirms previous microarray results

    The Atacama Cosmology Telescope: Temperature and Gravitational Lensing Power Spectrum Measurements from Three Seasons of Data

    Get PDF
    We present the temperature power spectra of the cosmic microwave background (CMB) derived from the three seasons of data from the Atacama Cosmology Telescope (ACT) at 148 GHz and 218 GHz, as well as the cross-frequency spectrum between the two channels. We detect and correct for contamination due to the Galactic cirrus in our equatorial maps. We present the results of a number of tests for possible systematic error and conclude that any effects are not significant compared to the statistical errors we quote. Where they overlap, we cross-correlate the ACT and the South Pole Telescope (SPT) maps and show they are consistent. The measurements of higher-order peaks in the CMB power spectrum provide an additional test of the Lambda CDM cosmological model, and help constrain extensions beyond the standard model. The small angular scale power spectrum also provides constraining power on the Sunyaev-Zel'dovich effects and extragalactic foregrounds. We also present a measurement of the CMB gravitational lensing convergence power spectrum at 4.6-sigma detection significance.Comment: 21 pages; 20 figures, Submitted to JCAP, some typos correcte
    corecore